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Academic misconduct is unethical behavior in academic work. To sustain 
integrity culture and mitigating unethical conducts among higher education 
institutions community, the academic misconduct detection must be done at 
an earlier stage. Thus, this study attempted to provide a new empirical 
contribution with the analysis of binary classification performances metrics 
to describe the ability of machine learning in predicting academic 
misconduct. Four machine learning algorithms have been used namely 
generalized linear model (GLM), logistic regression (LR), decision tree 
(DT), and random forest (RF). Beside performances comparison, this paper 
presents the analysis of academic misconduct factors that were constructed 
based on demography and fraud triangle theory (FTT). The findings showed 
that all the four machine learning algorithms have obtained good ability in 
the prediction models with the accuracy at above 80% and below 20% of the 
classification errors. Rationalization from the FTT attributes has shown as 
the most important factor in GLM, LR, and DT. In RF, opportunity of FTT 
attributes have become the most important. Compared to FTT attributes, 
demography attributes were not providing much benefits to all the machine 
learning models but remain applicable at very low weight correlations. 
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1. INTRODUCTION 


Machine learning techniques have been utilized in the field of education for predicting academic 
misconduct [1]-[4]. Academic misconduct, usually referred to as academic dishonesty, is a global problem. 
Academic misconduct is defined as a purposeful fraud [5] as well as a specific form of regulation violation in 
higher education institutions [6]. Plagiarism, exam or test cheating, unauthorized collaboration, and 
fabrication are a few examples. Recently, incidents of academic misconduct become more prevalent due to 
the implementation of emergency remote teaching in curbing the spread of COVID-19 disease [7], which in 
turn raises the crucial need to use automated machine learning in academic misconduct prediction study in 
achieving more accurate outcomes. A review of literature documents various risk factors associated with the 
occurrence of academic misconduct such as personality traits [8], individual and situational factors [9], [10], 
ethical orientation [11], religiosity [12], and fraud theories factors [13]. Predicting academic misconduct is 
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challenging but if the detection can be done at an earlier stage, then preventive measures can be taken more 
effectively at an earlier point of time. 

In the education domain, machine learning techniques play a major role in predicting various 
academic problems and issues such as student academic performance [14]-[16] and dropout [17]-[20]. 
Despite the importance of machine learning techniques in predicting academic misconduct more accurately, a 
review of literature shows very limited studies on this area [1]-[4] as most prior studies employed traditional 
statistical methods in predicting such unethical behavior [8]-[13]. Research by Kamalov et al. [1] is one of 
the studies that uses machine learning technique, recurrent neural network (RNN), and outlier detection 
method to predict exam cheating. In particular, this study uses RNN to identify unexpected high scores on the 
final exam for an average student, then the anomalies grade will be an input for the outlier detection method 
to identify potential academic cheating. Overall, the findings show that this study method significantly 
outperforms the benchmark method by achieving an average true positive rate (TPR) of 0.95 and false 
positive rate (FPR) of 0.05 for the classification results. Further, Wray et al. [2] aims to predict propensity 
academic dishonesty using decision tree (DT) analysis. The findings show that DT analysis complements the 
traditional approach probit regression model in terms of predictive accuracy. In addition, the results suggest 
students’ moral character as the most important factor in determining the propensity for academic dishonesty. 
In line with [1]-{3] construct machine learning detection on academic cheating via copying answers using 
multiple existences online (CAMEO) method. The prediction model is based on three categories of features 
namely student features, problem features, and submission features. Using a bayesian network, the model 
shows a high performance offering an area under curve (AUC) close to 1 and a sensitivity and specificity of 
0.96 and 0.99 respectively. The findings reveal that student features are more important than problem 
features and submission features. Tiong and Lee [4] employed four deep learning algorithms; deep neural 
network (DNN), DenseLSTM, long-short term memory (LSTM), and RNN to develop prediction models on 
online exam cheating. Using two exam datasets (mid-term and final-term exams) of Pyeongtaek University in 
South Korea, the results revealed accuracies of 68% for the DNN; 92% for the LSTM; 95% for the 
DenseLSTM; and 86% for the RNN. 

By reviewing the prior studies, it has been found out that the performance of the existing systems is 
comparatively less. Hence, this study aims to add the existing body of knowledge [1]-[4] by investigating the 
use of a machine learning classification approach for predicting academic misconduct among undergraduate 
students of higher education institutions in Malaysia. Following prior works [13], this study uses fraud 
triangle theory (FTT) factors; pressure, opportunity, and rationalization to predict academic misconduct 
incidence in a unique setting; emergency remote teaching during COVID-19 pandemic. 

There are two major contributions to this study. First, it attempted to extend previous work on 
academic misconduct prediction using machine learning techniques [1]-[4] by presenting evidence on a 
machine learning-based academic misconduct prediction model among Malaysian undergraduate students. 
To the best of our knowledge, the machine learning prediction study on academic misconduct has been 
reported with limited evaluation metrics that are not highlighting confusion matrix, precision, and recall. 
Second, it presents a new design and execution of machine learning prediction on academic misconduct 
based on FTT’s constructs to be compared with demography constructs. The rest of the paper is laid out as 
follows: section 2 discusses the data set for this investigation, as well as the machine learning experimental 
setting, the empirical findings for each algorithm are shown and discussed in section 3, and the summary and 
conclusions are presented in section 4. 


2. METHOD 
2.1. Data collection and datasets 

This study employed a questionnaire instrument to collect the dataset for constructing the machine 
learning prediction model on academic misconduct. The survey was distributed to undergraduate accounting 
students of Malaysian higher education institution during the implementation of emergency remote teaching. 
The questionnaire consists of two sections that was designed to acquire information on the students’ 
demographic; gender, attitude on learning, health status, peer academic misconduct, and academic 
misconduct experiences as well as perception on the attributes of FTT; pressure, opportunity, and 
rationalization [21]. This study uses 6 indicators, 8 indicators, and 5 indicators to measure pressure, 
opportunity, and rationalization respectively. The mean of total from each indicator was used for presenting 
each FTT attribute. Five indicators have been used to gauge students' experiences engaging in academic 
misconduct as the dependent variable (DV). The misconduct includes asking for external assistance, 
exchanging responses during online testing, plagiarizing, illicit collaboration, and searching for internet 
answers through discussion or forum groups. The mean of academic misconduct experience is the target 
variable of the prediction model. If the mean total for academic misconduct of a student is >2.5, the student is 
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labeled as 1 to represent academic dishonesty. Out of a total of 200 questionnaires distributed, 108 valid 
responses (54%) were used for the analysis. 


yL indicator, 


Pressure = =*=1————* (1) 
Opportunity = Z*=1idicatorr ae (2) 
Rationalization = n ee (3) 
Academic misconduct = ees (4) 


5 


2.2. Correlations of variables 

Table 1 lists the independent variables (IVs) from demography and FTT attributes. The DV is the 
class of academic misconduct either dishonesty or honesty, represented as | and 0 as given in Table 2. The 
percentages of distribution present the sampling number for each class and it can be seen that the figure of 
academic honesty is much higher than the academic dishonesty. Therefore, it is interesting to observe how 
the distribution can affect the ability of machine learning in predicting the case of academic dishonesty. 


Table 1. Pearson correlation of each IV to the DV 


Attribute Correlation coefficient 
Pressure 0.341 
Rationalization 0.304 
Opportunity 0.265 
Health 0.237 
Learning attitude 0.277 
Peer academic misconduct 0.071 
Gender 0.048 


Table 2. The DV of the classification model 


Class Data representation _ Distribution (%) 
Academic dishonesty 1 (true) 13.08 
Academic honesty 0 (false) 86.92 


Based on pearson correlation test, most of the attributes have low correlation coefficient to the DV 
and two demography attributes (peer academic misconduct and gender) have very low dependency with DV 
(below 0.1). However, in machine learning prediction, each of the attributes even with very low contribution 
of influence is expected to be useful in providing some degree of knowledge to the algorithm. Therefore, all 
attributes remain included in all machine learning models. The most important thing to be described is how 
much and how different each of the attributes worked in the different machine learning algorithm. 


2.3. Machine learning 

Four machine learning algorithms namely generalized linear model (GLM) [22], logistic regression 
(LR) [23], DT [24], and random forest (RF) [25] have been selected for comparison in this study. These five 
algorithms were selected based on the preliminary findings from the AutoModel module in the RapidMiner 
software that uses optimization search strategy to identify the suitable algorithms for the given dataset. Table 3 
lists the optimal parameters set of DT and RF from the preliminary machine learning hyper-parameters tuning. 

For the DT, the range of maximal depth used in the preliminary testing is between 2 to 25, with a 
consistent error rate for all the settings at 12.5%. Therefore, the minimal maximal depth 2 is taken for the 
algorithm. The number of trees used in the preliminary hyper-parameters tuning of RF are 20, 60, 100, and 
140. For each of the four numbers of trees, three values of maximal depth (2, 4, 7) have been used to be 
observed. The worst error rate was 18.8% with the number of trees equaling 20 and its maximal depth was 4. 
The best error rate is 10.9% with the configuration given in Table 3. 

Figure 1 depicts the process in RapidMiner for splitting the dataset into training and testing sets. As 
seen in the ratio field, the research used 0.7:0.3 testing validation ratio. Therefore, from the 108 data, 76 of 
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them were used for the machine learning training and 32 were used as a hold-out sample for the machine 
learning testing. 


Table 3. Configuration of parameters 


Algorithm Optimal parameters Error rate (%) 
DT Maximal depth=2 12.5 
RF Number of trees=100 10.9 


Maximal depth=2 


Process 
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Figure 1. Process for split ratio 


2.4. Performances metrics 

Because the machine learning algorithms were used to predict the probability of two classes of 
academic misconduct, the models used classification metrics that can be calculated based on the production 
of confusion matrix as depicted in Figure 2, which can be explained to the context of the academic 
misconduct as of the following; i) true positive (TP): the number of academy dishonesty can be correctly 
classified, ii) true negative (TN): the number of academy honesty can be correctly classified, iii) false 
positive (FP): the number of academy dishonesty incorrectly classified as honesty, and iv) false negative 
(FN): the number of academy honesty incorrectly classified as dishonesty. Based on the confusion matrix in 
Figure 2, the metrics for measuring the machine learning performances are accuracy, classification error, 
recall, and precision. Accuracy and classification error measure the performance of the machine learning in 
detecting both classes (1,0) from the total validation cases. On the other hand, recall and precision present the 
ability in detecting each specific class. The formula for accuracy and classification error as in (5) and (6): 


Accuracy = (TP +TN)/(TP +TN +FP+TN) (5) 

Classification error = (FN + FN)/(TP + TN + FP + TN) (6) 

The formula to measure the sensitivity of machine learning in predicting academic dishonesty 
(class 1) or recall is denoted in (7). Sensitivity or recall for class 1 is defined as the TPR to present how much 
academic dishonesty can be correctly predicted. The complement of recall for class 1 is precision or 
specificity that presents how much academic honesty can be correctly classified. The formula for precision is 
denoted in (8). 

Sensitivity /Recall/True Positive Rate (TPR) = TP/T =TP/(TP + FN) (7) 

Precision = TP/P = TP/(FP + TP) (8) 
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Classified as 
1 
Classified as 
0 


Figure 2. Confusion matrix for the academy dishonest model 


3. RESULTS AND DISCUSSION 

There are three sets of results presented from the study. Firstly, the results of performances of the 
machine learning to correctly (accuracy) and incorrectly (classification error) classify both cases of academic 
misconduct from the total validation cases provided in Table 4. TTC is the time to complete from the training 
to the validation stages in milliseconds. 


Table 4. The performances result 
Algorithm Accuracy (%) Classification error (%) TTC (ms) 


GLM 80.5 19.5 339 
LR 83.8 16.2 303 
DT 87.6 12.4 410 
RF 87.6 12.4 3,000 


In general, all machine learning algorithms have achieved good accuracy results (above 80%) with 
considerably less errors (lower than 20%), mainly DT and RF that used a tree-based paradigm for 
constructing the classification model. Both DT and RF performed at equal performances for achieving the 
accuracy but DT has lower processing TTC than RF. Although RF has taken the longest time, the process can 
be completed in just 3 seconds. RF structure is more complex because it uses more than one tree than DT, 
which causes it to take much more time than other algorithms. 

Second set of results is that the precision and recall for each class of academic misconduct can be 
measured based on the confusion matrix as labeled in Figure 2 that were generated from each machine 
learning algorithm as listed in Table 5. As expected, the class precision and recall for detecting academic 
dishonesty in all machine learning algorithms is lower than the results for predicting the academic honesty 
class. However, even with the very small numbers that are given for the machine learning training with the 
academic dishonesty class, the precision results from GLM, LR, and RF are considerably good enough 
(50-75%). DT probably did not experience academic dishonesty data during the training stage that resulted in 
0% of precision and recall for the case 1 class. 


Table 5. Confusion matrix of GLM 


Real academic dishonest _ Real academic dishonest Class precision (%) 


GLM 

Predicted as academic dishonesty 2 2 50 
Predicted as academic honesty 4 23 85.2 
Class recall 33.33% 92.0% 

LR 
Predicted as academic dishonesty 1 4 50.0 
Predicted as academic honesty 1 25 86.2 
Class recall 20.0% 96.1% 

DT 
Predicted as academic dishonesty 0 0 00.0 
Predicted as academic honesty 4 27 87.0 
Class recall 00.0% 100.0% 

RF 
Predicted as academic dishonesty 3 1 75.0 
Predicted as academic honesty 3 24 88.9 
Class recall 50.0% 96.0% 
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Lastly, the third set of results explains how each attribute from demography and FTT was used in 
the different machine learning algorithm as listed in Table 6. Table 5 lists the weight of correlation 
coefficients that the machine learning used for the academic misconduct prediction. In general, the 
rationalization attribute from FTT has become the most important to GLM, LR, and DT but in RF, 
opportunity attribute was the highest. The research findings indicate that the rationalization attribute of the 
FTT becomes significant when students attempt to rationalize their academic misconduct by providing self- 
justifications. To illustrate, students may persuade themselves that engaging in cheating or plagiarism is 
justified due to various factors such as the pressure to attain high grades, an excessive workload, the 
prevalence of such behavior among peers, and the perception of an arbitrary grading system [13]. 


Table 6. The weights of correlations of each academic misconduct attributes 


Attributes GLM__LR DT RE 

FTT 

Pressure 0.022 0.006 0.006 0.038 
Opportunity 0.044 0.056 0.009 0.083 
Rationalization 0.059 0.081 0.247 0.044 
Demography 

Gender 0.014 0.037 0.042 0.034 
Health 0.028 0.020 0.013 0.035 
Learning attitude 0.046 0.061 0.043 0.019 


Peer academic misconduct 0.006 0.060 0.004 0.029 


From the demography attributes, the variations of importance seem similar from each attribute and 
learning attitude is the second highest in GLM, LR, and DT after rationalization. Although health has the 
highest correlation coefficient outside machine learning model (refer Table 1), it has become the second 
important in RF. Gender and peer academic misconduct remain as the least significant attributes in all the 
machine learning models consistent with the rank of correlation coefficient in Table 1. 


4. CONCLUSION 

This research has opened up many research opportunities related to machine learning prediction in 
the education domain particularly for academic misconduct. Machine learning has an intelligent mechanism 
that is able to continuously learn from the prediction errors it can measure during the training phase. At each 
row of prediction from the training data, it will improve the attributes correlation coefficients given for the 
models by using mathematical derivation until the best configurations are found. Based on the tested dataset 
that focused on students from higher institution in Malaysia, the findings of this research showed that the 
factors from FTT have been more useful to the performance of machine learning prediction models than 
demographic factors. Various research questions can be raised based on these findings that need a lot of 
extensive research work either on the machine learning or in the attributes of the prediction models. 
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